Which factors affect Airbnb Seattle Pricing ?


1. Introduction:

In this notebook we will use CRISP-DM methodology to explore and analyise Airbnb Seattle Dataset. Several methodologies and techniques will be used to demonstrate the findings including data wrangling and different data visualization techniques. We use CRISP-DM (Cross-Industry Standard Process for Data Mining) which is an industry-proven methodology for data mining which was proposed by IBM. CRISP-DM cosnsists of the following steps:

  • Business Understanding
  • Data Understanding
  • Data Preparation
  • Data Modeling
  • Evaluation
  • Deployment

2. Business Understanding:

Airbnb is the world leader in lodging industry with a turnover of USD 2.6B (2017). Airbnb is privately held and it serves the world from San Fransisco, California. Its crusial for such a service to understand why customers tend to select one host over the other, is it because of the area, the facilities and services, or other factors. Also, its. essential to understand how can to setup the cost and which features or factors are essential to dictate the cost.

In this report we will study and analyze Airbnb Seattle Dataset to answer the following questions:

  • Whom are the top 15 hosts in Seattle?

  • Which are the top 15 booked neighbourhoods ?

  • What are the top factors affecting the price?

  • What is the average of the accomodations based on the prvious factors?

To be able to answer the questions, we will be needing to get a dataset. We have used kaggle to get the Seattle Dataset


3. Data Understanding

In this step we will import the ncessary python libraries, loading the datasets, viewing, checking the shape of the datasets, data types, missing and null values and answer the basic questions

In [582]:
# Import the ncessary libraries

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from matplotlib.pyplot import plot
%matplotlib inline
import seaborn as sns
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 100)
In [583]:
# Import the data

listings=pd.read_csv('listings.csv')
listings_num=pd.read_csv('listings.csv')
reviews=pd.read_csv('reviews.csv')
calendar=pd.read_csv('calendar.csv')
In [584]:
# Checking the head and tail of listings table
listings.head()
Out[584]:
id listing_url scrape_id last_scraped name summary space description experiences_offered neighborhood_overview notes transit thumbnail_url medium_url picture_url xl_picture_url host_id host_url host_name host_since host_location host_about host_response_time host_response_rate host_acceptance_rate host_is_superhost host_thumbnail_url host_picture_url host_neighbourhood host_listings_count host_total_listings_count host_verifications host_has_profile_pic host_identity_verified street neighbourhood neighbourhood_cleansed neighbourhood_group_cleansed city state zipcode market smart_location country_code country latitude longitude is_location_exact property_type room_type accommodates bathrooms bedrooms beds bed_type amenities square_feet price weekly_price monthly_price security_deposit cleaning_fee guests_included extra_people minimum_nights maximum_nights calendar_updated has_availability availability_30 availability_60 availability_90 availability_365 calendar_last_scraped number_of_reviews first_review last_review review_scores_rating review_scores_accuracy review_scores_cleanliness review_scores_checkin review_scores_communication review_scores_location review_scores_value requires_license license jurisdiction_names instant_bookable cancellation_policy require_guest_profile_picture require_guest_phone_verification calculated_host_listings_count reviews_per_month
0 241032 https://www.airbnb.com/rooms/241032 20160104002432 2016-01-04 Stylish Queen Anne Apartment NaN Make your self at home in this charming one-be... Make your self at home in this charming one-be... none NaN NaN NaN NaN NaN https://a1.muscache.com/ac/pictures/67560560/c... NaN 956883 https://www.airbnb.com/users/show/956883 Maija 2011-08-11 Seattle, Washington, United States I am an artist, interior designer, and run a s... within a few hours 96% 100% f https://a0.muscache.com/ac/users/956883/profil... https://a0.muscache.com/ac/users/956883/profil... Queen Anne 3.0 3.0 ['email', 'phone', 'reviews', 'kba'] t t Gilman Dr W, Seattle, WA 98119, United States Queen Anne West Queen Anne Queen Anne Seattle WA 98119 Seattle Seattle, WA US United States 47.636289 -122.371025 t Apartment Entire home/apt 4 1.0 1.0 1.0 Real Bed {TV,"Cable TV",Internet,"Wireless Internet","A... NaN $85.00 NaN NaN NaN NaN 2 $5.00 1 365 4 weeks ago t 14 41 71 346 2016-01-04 207 2011-11-01 2016-01-02 95.0 10.0 10.0 10.0 10.0 9.0 10.0 f NaN WASHINGTON f moderate f f 2 4.07
1 953595 https://www.airbnb.com/rooms/953595 20160104002432 2016-01-04 Bright & Airy Queen Anne Apartment Chemically sensitive? We've removed the irrita... Beautiful, hypoallergenic apartment in an extr... Chemically sensitive? We've removed the irrita... none Queen Anne is a wonderful, truly functional vi... What's up with the free pillows? Our home was... Convenient bus stops are just down the block, ... https://a0.muscache.com/ac/pictures/14409893/f... https://a0.muscache.com/im/pictures/14409893/f... https://a0.muscache.com/ac/pictures/14409893/f... https://a0.muscache.com/ac/pictures/14409893/f... 5177328 https://www.airbnb.com/users/show/5177328 Andrea 2013-02-21 Seattle, Washington, United States Living east coast/left coast/overseas. Time i... within an hour 98% 100% t https://a0.muscache.com/ac/users/5177328/profi... https://a0.muscache.com/ac/users/5177328/profi... Queen Anne 6.0 6.0 ['email', 'phone', 'facebook', 'linkedin', 're... t t 7th Avenue West, Seattle, WA 98119, United States Queen Anne West Queen Anne Queen Anne Seattle WA 98119 Seattle Seattle, WA US United States 47.639123 -122.365666 t Apartment Entire home/apt 4 1.0 1.0 1.0 Real Bed {TV,Internet,"Wireless Internet",Kitchen,"Free... NaN $150.00 $1,000.00 $3,000.00 $100.00 $40.00 1 $0.00 2 90 today t 13 13 16 291 2016-01-04 43 2013-08-19 2015-12-29 96.0 10.0 10.0 10.0 10.0 10.0 10.0 f NaN WASHINGTON f strict t t 6 1.48
2 3308979 https://www.airbnb.com/rooms/3308979 20160104002432 2016-01-04 New Modern House-Amazing water view New modern house built in 2013. Spectacular s... Our house is modern, light and fresh with a wa... New modern house built in 2013. Spectacular s... none Upper Queen Anne is a charming neighborhood fu... Our house is located just 5 short blocks to To... A bus stop is just 2 blocks away. Easy bus a... NaN NaN https://a2.muscache.com/ac/pictures/b4324e0f-a... NaN 16708587 https://www.airbnb.com/users/show/16708587 Jill 2014-06-12 Seattle, Washington, United States i love living in Seattle. i grew up in the mi... within a few hours 67% 100% f https://a1.muscache.com/ac/users/16708587/prof... https://a1.muscache.com/ac/users/16708587/prof... Queen Anne 2.0 2.0 ['email', 'phone', 'google', 'reviews', 'jumio'] t t West Lee Street, Seattle, WA 98119, United States Queen Anne West Queen Anne Queen Anne Seattle WA 98119 Seattle Seattle, WA US United States 47.629724 -122.369483 t House Entire home/apt 11 4.5 5.0 7.0 Real Bed {TV,"Cable TV",Internet,"Wireless Internet","A... NaN $975.00 NaN NaN $1,000.00 $300.00 10 $25.00 4 30 5 weeks ago t 1 6 17 220 2016-01-04 20 2014-07-30 2015-09-03 97.0 10.0 10.0 10.0 10.0 10.0 10.0 f NaN WASHINGTON f strict f f 2 1.15
3 7421966 https://www.airbnb.com/rooms/7421966 20160104002432 2016-01-04 Queen Anne Chateau A charming apartment that sits atop Queen Anne... NaN A charming apartment that sits atop Queen Anne... none NaN NaN NaN NaN NaN https://a0.muscache.com/ac/pictures/94146944/6... NaN 9851441 https://www.airbnb.com/users/show/9851441 Emily 2013-11-06 Seattle, Washington, United States NaN NaN NaN NaN f https://a2.muscache.com/ac/users/9851441/profi... https://a2.muscache.com/ac/users/9851441/profi... Queen Anne 1.0 1.0 ['email', 'phone', 'facebook', 'reviews', 'jum... t t 8th Avenue West, Seattle, WA 98119, United States Queen Anne West Queen Anne Queen Anne Seattle WA 98119 Seattle Seattle, WA US United States 47.638473 -122.369279 t Apartment Entire home/apt 3 1.0 0.0 2.0 Real Bed {Internet,"Wireless Internet",Kitchen,"Indoor ... NaN $100.00 $650.00 $2,300.00 NaN NaN 1 $0.00 1 1125 6 months ago t 0 0 0 143 2016-01-04 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN f NaN WASHINGTON f flexible f f 1 NaN
4 278830 https://www.airbnb.com/rooms/278830 20160104002432 2016-01-04 Charming craftsman 3 bdm house Cozy family craftman house in beautiful neighb... Cozy family craftman house in beautiful neighb... Cozy family craftman house in beautiful neighb... none We are in the beautiful neighborhood of Queen ... Belltown The nearest public transit bus (D Line) is 2 b... NaN NaN https://a1.muscache.com/ac/pictures/6120468/b0... NaN 1452570 https://www.airbnb.com/users/show/1452570 Emily 2011-11-29 Seattle, Washington, United States Hi, I live in Seattle, Washington but I'm orig... within an hour 100% NaN f https://a0.muscache.com/ac/users/1452570/profi... https://a0.muscache.com/ac/users/1452570/profi... Queen Anne 2.0 2.0 ['email', 'phone', 'facebook', 'reviews', 'kba'] t t 14th Ave W, Seattle, WA 98119, United States Queen Anne West Queen Anne Queen Anne Seattle WA 98119 Seattle Seattle, WA US United States 47.632918 -122.372471 t House Entire home/apt 6 2.0 3.0 3.0 Real Bed {TV,"Cable TV",Internet,"Wireless Internet",Ki... NaN $450.00 NaN NaN $700.00 $125.00 6 $15.00 1 1125 7 weeks ago t 30 60 90 365 2016-01-04 38 2012-07-10 2015-10-24 92.0 9.0 9.0 10.0 10.0 9.0 9.0 f NaN WASHINGTON f strict f f 1 0.89
In [585]:
listings.tail()
Out[585]:
id listing_url scrape_id last_scraped name summary space description experiences_offered neighborhood_overview notes transit thumbnail_url medium_url picture_url xl_picture_url host_id host_url host_name host_since host_location host_about host_response_time host_response_rate host_acceptance_rate host_is_superhost host_thumbnail_url host_picture_url host_neighbourhood host_listings_count host_total_listings_count host_verifications host_has_profile_pic host_identity_verified street neighbourhood neighbourhood_cleansed neighbourhood_group_cleansed city state zipcode market smart_location country_code country latitude longitude is_location_exact property_type room_type accommodates bathrooms bedrooms beds bed_type amenities square_feet price weekly_price monthly_price security_deposit cleaning_fee guests_included extra_people minimum_nights maximum_nights calendar_updated has_availability availability_30 availability_60 availability_90 availability_365 calendar_last_scraped number_of_reviews first_review last_review review_scores_rating review_scores_accuracy review_scores_cleanliness review_scores_checkin review_scores_communication review_scores_location review_scores_value requires_license license jurisdiction_names instant_bookable cancellation_policy require_guest_profile_picture require_guest_phone_verification calculated_host_listings_count reviews_per_month
3813 8101950 https://www.airbnb.com/rooms/8101950 20160104002432 2016-01-04 3BR Mountain View House in Seattle Our 3BR/2BA house boasts incredible views of t... Our 3BR/2BA house bright, stylish, and wheelch... Our 3BR/2BA house boasts incredible views of t... none We're located near lots of family fun. Woodlan... NaN NaN https://a2.muscache.com/ac/pictures/103217071/... https://a2.muscache.com/im/pictures/103217071/... https://a2.muscache.com/ac/pictures/103217071/... https://a2.muscache.com/ac/pictures/103217071/... 31148752 https://www.airbnb.com/users/show/31148752 Bo 2015-04-13 US NaN within a few hours 99% 100% f https://a2.muscache.com/ac/users/31148752/prof... https://a2.muscache.com/ac/users/31148752/prof... Holly 354.0 354.0 ['email', 'phone', 'linkedin', 'reviews', 'jum... t t Northwest 48th Street, Seattle, WA 98107, Unit... Fremont Fremont Other neighborhoods Seattle WA 98107 Seattle Seattle, WA US United States 47.664295 -122.359170 t House Entire home/apt 6 2.0 3.0 3.0 Real Bed {TV,"Cable TV",Internet,"Wireless Internet","A... NaN $359.00 NaN NaN NaN $230.00 1 $0.00 3 1125 today t 18 32 32 32 2016-01-04 1 2015-09-27 2015-09-27 80.0 8.0 10.0 4.0 8.0 10.0 8.0 f NaN WASHINGTON f strict f f 8 0.3
3814 8902327 https://www.airbnb.com/rooms/8902327 20160104002432 2016-01-04 Portage Bay View!-One Bedroom Apt 800 square foot 1 bedroom basement apartment w... This space has a great view of Portage Bay wit... 800 square foot 1 bedroom basement apartment w... none The neighborhood is a quiet oasis that is clos... This is a basement apartment in a newer reside... Uber and Car2go are good options in Seattle. T... https://a2.muscache.com/ac/pictures/626d4b1f-6... https://a2.muscache.com/im/pictures/626d4b1f-6... https://a2.muscache.com/ac/pictures/626d4b1f-6... https://a2.muscache.com/ac/pictures/626d4b1f-6... 46566046 https://www.airbnb.com/users/show/46566046 Glen 2015-10-14 Seattle, Washington, United States I am a 58 year old male that is married to Mag... within an hour 100% 100% f https://a2.muscache.com/ac/pictures/d7e59b0d-8... https://a2.muscache.com/ac/pictures/d7e59b0d-8... Portage Bay 1.0 1.0 ['email', 'phone', 'facebook', 'reviews', 'jum... t t Fuhrman Avenue East, Seattle, WA 98102, United... Portage Bay Portage Bay Capitol Hill Seattle WA 98102 Seattle Seattle, WA US United States 47.649552 -122.318309 t Apartment Entire home/apt 4 1.0 1.0 2.0 Real Bed {TV,"Cable TV",Internet,"Wireless Internet",Ki... NaN $79.00 NaN NaN $500.00 $50.00 3 $25.00 2 29 2 days ago t 6 26 44 273 2016-01-04 2 2015-12-18 2015-12-24 100.0 10.0 10.0 10.0 10.0 10.0 10.0 f NaN WASHINGTON f moderate f f 1 2.0
3815 10267360 https://www.airbnb.com/rooms/10267360 20160104002432 2016-01-04 Private apartment view of Lake WA Very comfortable lower unit. Quiet, charming m... NaN Very comfortable lower unit. Quiet, charming m... none NaN NaN NaN https://a2.muscache.com/ac/pictures/a5974f04-2... https://a2.muscache.com/im/pictures/a5974f04-2... https://a2.muscache.com/ac/pictures/a5974f04-2... https://a2.muscache.com/ac/pictures/a5974f04-2... 52791370 https://www.airbnb.com/users/show/52791370 Virginia 2015-12-30 US NaN NaN NaN NaN f https://a2.muscache.com/ac/pictures/efc75826-1... https://a2.muscache.com/ac/pictures/efc75826-1... NaN 1.0 1.0 ['phone'] t f South Laurel Street, Seattle, WA 98178, United... NaN Rainier Beach Rainier Valley Seattle WA 98178 Seattle Seattle, WA US United States 47.508453 -122.240607 f House Entire home/apt 2 1.0 1.0 1.0 Real Bed {"Cable TV","Wireless Internet",Kitchen,"Free ... NaN $93.00 $450.00 NaN $250.00 $35.00 2 $20.00 1 7 4 days ago t 29 59 88 88 2016-01-04 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN f NaN WASHINGTON f moderate f f 1 NaN
3816 9604740 https://www.airbnb.com/rooms/9604740 20160104002432 2016-01-04 Amazing View with Modern Comfort! Cozy studio condo in the heart on Madison Park... Fully furnished unit to accommodate most needs... Cozy studio condo in the heart on Madison Park... none Madison Park offers a peaceful slow pace upsca... NaN Yes https://a2.muscache.com/ac/pictures/202e4ad6-b... https://a2.muscache.com/im/pictures/202e4ad6-b... https://a2.muscache.com/ac/pictures/202e4ad6-b... https://a2.muscache.com/ac/pictures/202e4ad6-b... 25522052 https://www.airbnb.com/users/show/25522052 Karen 2015-01-03 Tacoma, Washington, United States NaN within an hour 100% NaN f https://a0.muscache.com/ac/users/25522052/prof... https://a0.muscache.com/ac/users/25522052/prof... NaN 1.0 1.0 ['email', 'phone', 'facebook', 'reviews', 'kba'] t t 43rd Avenue East, Seattle, WA 98112, United St... NaN Madison Park Capitol Hill Seattle WA 98112 Seattle Seattle, WA US United States 47.632335 -122.275530 f Condominium Entire home/apt 2 1.0 0.0 1.0 Real Bed {TV,"Wireless Internet",Kitchen,"Free Parking ... NaN $99.00 NaN NaN $300.00 $45.00 1 $0.00 3 1125 never t 30 60 90 179 2016-01-04 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN f NaN WASHINGTON f moderate f f 1 NaN
3817 10208623 https://www.airbnb.com/rooms/10208623 20160104002432 2016-01-04 Large Lakefront Apartment All hardwood floors, fireplace, 65" TV with Xb... NaN All hardwood floors, fireplace, 65" TV with Xb... none NaN Also our puppy will be boarded away. NaN https://a2.muscache.com/ac/pictures/596705b3-0... https://a2.muscache.com/im/pictures/596705b3-0... https://a2.muscache.com/ac/pictures/596705b3-0... https://a2.muscache.com/ac/pictures/596705b3-0... 14703116 https://www.airbnb.com/users/show/14703116 Gil 2014-04-25 Seattle, Washington, United States NaN within a day 100% NaN f https://a2.muscache.com/ac/pictures/8864f979-b... https://a2.muscache.com/ac/pictures/8864f979-b... Queen Anne 1.0 1.0 ['email', 'phone', 'reviews', 'kba'] t t Westlake Avenue North, Seattle, WA 98109, Unit... Queen Anne East Queen Anne Queen Anne Seattle WA 98109 Seattle Seattle, WA US United States 47.641186 -122.342085 t Apartment Entire home/apt 3 1.5 2.0 1.0 Real Bed {TV,"Cable TV",Internet,"Wireless Internet",Ki... NaN $87.00 NaN NaN NaN NaN 1 $0.00 1 1125 a week ago t 7 7 7 7 2016-01-04 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN f NaN WASHINGTON f flexible f f 1 NaN
In [586]:
listings.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3818 entries, 0 to 3817
Data columns (total 92 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   id                                3818 non-null   int64  
 1   listing_url                       3818 non-null   object 
 2   scrape_id                         3818 non-null   int64  
 3   last_scraped                      3818 non-null   object 
 4   name                              3818 non-null   object 
 5   summary                           3641 non-null   object 
 6   space                             3249 non-null   object 
 7   description                       3818 non-null   object 
 8   experiences_offered               3818 non-null   object 
 9   neighborhood_overview             2786 non-null   object 
 10  notes                             2212 non-null   object 
 11  transit                           2884 non-null   object 
 12  thumbnail_url                     3498 non-null   object 
 13  medium_url                        3498 non-null   object 
 14  picture_url                       3818 non-null   object 
 15  xl_picture_url                    3498 non-null   object 
 16  host_id                           3818 non-null   int64  
 17  host_url                          3818 non-null   object 
 18  host_name                         3816 non-null   object 
 19  host_since                        3816 non-null   object 
 20  host_location                     3810 non-null   object 
 21  host_about                        2959 non-null   object 
 22  host_response_time                3295 non-null   object 
 23  host_response_rate                3295 non-null   object 
 24  host_acceptance_rate              3045 non-null   object 
 25  host_is_superhost                 3816 non-null   object 
 26  host_thumbnail_url                3816 non-null   object 
 27  host_picture_url                  3816 non-null   object 
 28  host_neighbourhood                3518 non-null   object 
 29  host_listings_count               3816 non-null   float64
 30  host_total_listings_count         3816 non-null   float64
 31  host_verifications                3818 non-null   object 
 32  host_has_profile_pic              3816 non-null   object 
 33  host_identity_verified            3816 non-null   object 
 34  street                            3818 non-null   object 
 35  neighbourhood                     3402 non-null   object 
 36  neighbourhood_cleansed            3818 non-null   object 
 37  neighbourhood_group_cleansed      3818 non-null   object 
 38  city                              3818 non-null   object 
 39  state                             3818 non-null   object 
 40  zipcode                           3811 non-null   object 
 41  market                            3818 non-null   object 
 42  smart_location                    3818 non-null   object 
 43  country_code                      3818 non-null   object 
 44  country                           3818 non-null   object 
 45  latitude                          3818 non-null   float64
 46  longitude                         3818 non-null   float64
 47  is_location_exact                 3818 non-null   object 
 48  property_type                     3817 non-null   object 
 49  room_type                         3818 non-null   object 
 50  accommodates                      3818 non-null   int64  
 51  bathrooms                         3802 non-null   float64
 52  bedrooms                          3812 non-null   float64
 53  beds                              3817 non-null   float64
 54  bed_type                          3818 non-null   object 
 55  amenities                         3818 non-null   object 
 56  square_feet                       97 non-null     float64
 57  price                             3818 non-null   object 
 58  weekly_price                      2009 non-null   object 
 59  monthly_price                     1517 non-null   object 
 60  security_deposit                  1866 non-null   object 
 61  cleaning_fee                      2788 non-null   object 
 62  guests_included                   3818 non-null   int64  
 63  extra_people                      3818 non-null   object 
 64  minimum_nights                    3818 non-null   int64  
 65  maximum_nights                    3818 non-null   int64  
 66  calendar_updated                  3818 non-null   object 
 67  has_availability                  3818 non-null   object 
 68  availability_30                   3818 non-null   int64  
 69  availability_60                   3818 non-null   int64  
 70  availability_90                   3818 non-null   int64  
 71  availability_365                  3818 non-null   int64  
 72  calendar_last_scraped             3818 non-null   object 
 73  number_of_reviews                 3818 non-null   int64  
 74  first_review                      3191 non-null   object 
 75  last_review                       3191 non-null   object 
 76  review_scores_rating              3171 non-null   float64
 77  review_scores_accuracy            3160 non-null   float64
 78  review_scores_cleanliness         3165 non-null   float64
 79  review_scores_checkin             3160 non-null   float64
 80  review_scores_communication       3167 non-null   float64
 81  review_scores_location            3163 non-null   float64
 82  review_scores_value               3162 non-null   float64
 83  requires_license                  3818 non-null   object 
 84  license                           0 non-null      float64
 85  jurisdiction_names                3818 non-null   object 
 86  instant_bookable                  3818 non-null   object 
 87  cancellation_policy               3818 non-null   object 
 88  require_guest_profile_picture     3818 non-null   object 
 89  require_guest_phone_verification  3818 non-null   object 
 90  calculated_host_listings_count    3818 non-null   int64  
 91  reviews_per_month                 3191 non-null   float64
dtypes: float64(17), int64(13), object(62)
memory usage: 2.7+ MB

For the listings dataset, we have 3,818 records/rows and 92 features/columns. With regards to the data types. We can see that there are 17 float columns, 13 int columns and 62 object columns. We can notice some empty null and missing values based on the number of records.

In [587]:
# Checking the head and tail of reviews table
reviews.head()
Out[587]:
listing_id id date reviewer_id reviewer_name comments
0 7202016 38917982 2015-07-19 28943674 Bianca Cute and cozy place. Perfect location to every...
1 7202016 39087409 2015-07-20 32440555 Frank Kelly has a great room in a very central locat...
2 7202016 39820030 2015-07-26 37722850 Ian Very spacious apartment, and in a great neighb...
3 7202016 40813543 2015-08-02 33671805 George Close to Seattle Center and all it has to offe...
4 7202016 41986501 2015-08-10 34959538 Ming Kelly was a great host and very accommodating ...
In [588]:
reviews.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 84849 entries, 0 to 84848
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   listing_id     84849 non-null  int64 
 1   id             84849 non-null  int64 
 2   date           84849 non-null  object
 3   reviewer_id    84849 non-null  int64 
 4   reviewer_name  84849 non-null  object
 5   comments       84831 non-null  object
dtypes: int64(3), object(3)
memory usage: 3.9+ MB

For the reviews dataset, there are 84,849 records/rows and 6 fatures/columns. There are 3 integer columns and 3 object columns. There are also missing values in the comments feature.

In [589]:
# Checking the head and tail of calendar table
calendar.head()
Out[589]:
listing_id date available price
0 241032 2016-01-04 t $85.00
1 241032 2016-01-05 t $85.00
2 241032 2016-01-06 f NaN
3 241032 2016-01-07 f NaN
4 241032 2016-01-08 f NaN
In [590]:
calendar.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1393570 entries, 0 to 1393569
Data columns (total 4 columns):
 #   Column      Non-Null Count    Dtype 
---  ------      --------------    ----- 
 0   listing_id  1393570 non-null  int64 
 1   date        1393570 non-null  object
 2   available   1393570 non-null  object
 3   price       934542 non-null   object
dtypes: int64(1), object(3)
memory usage: 42.5+ MB

For the calendar dataset, thre are 1,393,570 records/rows and 4 features/columns. There 1 integer and 3 object columns. There are missing values in the price column.

In [591]:
# Review the data with describtive statistics for the listings 
listings.describe()
Out[591]:
id scrape_id host_id host_listings_count host_total_listings_count latitude longitude accommodates bathrooms bedrooms beds square_feet guests_included minimum_nights maximum_nights availability_30 availability_60 availability_90 availability_365 number_of_reviews review_scores_rating review_scores_accuracy review_scores_cleanliness review_scores_checkin review_scores_communication review_scores_location review_scores_value license calculated_host_listings_count reviews_per_month
count 3.818000e+03 3.818000e+03 3.818000e+03 3816.000000 3816.000000 3818.000000 3818.000000 3818.000000 3802.000000 3812.000000 3817.000000 97.000000 3818.000000 3818.000000 3818.000000 3818.000000 3818.000000 3818.000000 3818.000000 3818.000000 3171.000000 3160.000000 3165.000000 3160.000000 3167.000000 3163.000000 3162.000000 0.0 3818.000000 3191.000000
mean 5.550111e+06 2.016010e+13 1.578556e+07 7.157757 7.157757 47.628961 -122.333103 3.349398 1.259469 1.307712 1.735394 854.618557 1.672603 2.369303 780.447617 16.786276 36.814825 58.082504 244.772656 22.223415 94.539262 9.636392 9.556398 9.786709 9.809599 9.608916 9.452245 NaN 2.946307 2.078919
std 2.962660e+06 0.000000e+00 1.458382e+07 28.628149 28.628149 0.043052 0.031745 1.977599 0.590369 0.883395 1.139480 671.404893 1.311040 16.305902 1683.589007 12.173637 23.337541 34.063845 126.772526 37.730892 6.606083 0.698031 0.797274 0.595499 0.568211 0.629053 0.750259 NaN 5.893029 1.822348
min 3.335000e+03 2.016010e+13 4.193000e+03 1.000000 1.000000 47.505088 -122.417219 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 20.000000 2.000000 3.000000 2.000000 2.000000 4.000000 2.000000 NaN 1.000000 0.020000
25% 3.258256e+06 2.016010e+13 3.275204e+06 1.000000 1.000000 47.609418 -122.354320 2.000000 1.000000 1.000000 1.000000 420.000000 1.000000 1.000000 60.000000 2.000000 13.000000 28.000000 124.000000 2.000000 93.000000 9.000000 9.000000 10.000000 10.000000 9.000000 9.000000 NaN 1.000000 0.695000
50% 6.118244e+06 2.016010e+13 1.055814e+07 1.000000 1.000000 47.623601 -122.328874 3.000000 1.000000 1.000000 1.000000 750.000000 1.000000 2.000000 1125.000000 20.000000 46.000000 73.000000 308.000000 9.000000 96.000000 10.000000 10.000000 10.000000 10.000000 10.000000 10.000000 NaN 1.000000 1.540000
75% 8.035127e+06 2.016010e+13 2.590309e+07 3.000000 3.000000 47.662694 -122.310800 4.000000 1.000000 2.000000 2.000000 1200.000000 2.000000 2.000000 1125.000000 30.000000 59.000000 89.000000 360.000000 26.000000 99.000000 10.000000 10.000000 10.000000 10.000000 10.000000 10.000000 NaN 2.000000 3.000000
max 1.034016e+07 2.016010e+13 5.320861e+07 502.000000 502.000000 47.733358 -122.240607 16.000000 8.000000 7.000000 15.000000 3000.000000 15.000000 1000.000000 100000.000000 30.000000 60.000000 90.000000 365.000000 474.000000 100.000000 10.000000 10.000000 10.000000 10.000000 10.000000 10.000000 NaN 37.000000 12.150000
In [592]:
reviews.describe()
Out[592]:
listing_id id reviewer_id
count 8.484900e+04 8.484900e+04 8.484900e+04
mean 3.005067e+06 3.058765e+07 1.701301e+07
std 2.472877e+06 1.636613e+07 1.353704e+07
min 4.291000e+03 3.721000e+03 1.500000e+01
25% 7.946330e+05 1.725127e+07 5.053141e+06
50% 2.488228e+06 3.228809e+07 1.413476e+07
75% 4.694479e+06 4.457648e+07 2.762402e+07
max 1.024814e+07 5.873651e+07 5.281274e+07
In [593]:
calendar.describe()
Out[593]:
listing_id
count 1.393570e+06
mean 5.550111e+06
std 2.962274e+06
min 3.335000e+03
25% 3.258213e+06
50% 6.118244e+06
75% 8.035212e+06
max 1.034016e+07

From the above we can see that there are a lot of data inconsistency from data type prospective and will require cleaning and wrangling in terms of missing and nan values and replicated columns


4. Data Preparation:

In this step, the ncessary data is gathered to answer our questions. Replicated and uncessary columns are dropped. Missing and nan values are fixed. Categorial and other data types are handled.

We will use only listing table that reflect the required information on our questions

In [594]:
# Checking the % of the null values in every column of the listings dataset
(listings.isnull().sum()/len(listings)*100).sort_values(ascending=False)
Out[594]:
license                             100.000000
square_feet                          97.459403
monthly_price                        60.267156
security_deposit                     51.126244
weekly_price                         47.380828
notes                                42.063908
neighborhood_overview                27.029859
cleaning_fee                         26.977475
transit                              24.463070
host_about                           22.498690
host_acceptance_rate                 20.246202
review_scores_accuracy               17.234154
review_scores_checkin                17.234154
review_scores_value                  17.181771
review_scores_location               17.155579
review_scores_cleanliness            17.103195
review_scores_communication          17.050812
review_scores_rating                 16.946045
reviews_per_month                    16.422211
first_review                         16.422211
last_review                          16.422211
space                                14.903091
host_response_time                   13.698271
host_response_rate                   13.698271
neighbourhood                        10.895757
xl_picture_url                        8.381351
thumbnail_url                         8.381351
medium_url                            8.381351
host_neighbourhood                    7.857517
summary                               4.635935
bathrooms                             0.419068
host_location                         0.209534
zipcode                               0.183342
bedrooms                              0.157150
host_identity_verified                0.052383
host_has_profile_pic                  0.052383
host_picture_url                      0.052383
host_since                            0.052383
host_total_listings_count             0.052383
host_listings_count                   0.052383
host_thumbnail_url                    0.052383
host_name                             0.052383
host_is_superhost                     0.052383
beds                                  0.026192
property_type                         0.026192
host_verifications                    0.000000
host_url                              0.000000
host_id                               0.000000
picture_url                           0.000000
experiences_offered                   0.000000
description                           0.000000
name                                  0.000000
last_scraped                          0.000000
scrape_id                             0.000000
listing_url                           0.000000
street                                0.000000
latitude                              0.000000
neighbourhood_cleansed                0.000000
calendar_last_scraped                 0.000000
calendar_updated                      0.000000
has_availability                      0.000000
availability_30                       0.000000
availability_60                       0.000000
availability_90                       0.000000
availability_365                      0.000000
number_of_reviews                     0.000000
minimum_nights                        0.000000
requires_license                      0.000000
jurisdiction_names                    0.000000
instant_bookable                      0.000000
cancellation_policy                   0.000000
require_guest_profile_picture         0.000000
require_guest_phone_verification      0.000000
maximum_nights                        0.000000
extra_people                          0.000000
neighbourhood_group_cleansed          0.000000
calculated_host_listings_count        0.000000
city                                  0.000000
state                                 0.000000
market                                0.000000
smart_location                        0.000000
country_code                          0.000000
country                               0.000000
longitude                             0.000000
guests_included                       0.000000
is_location_exact                     0.000000
room_type                             0.000000
accommodates                          0.000000
bed_type                              0.000000
amenities                             0.000000
price                                 0.000000
id                                    0.000000
dtype: float64

From the above we can see that the top 11 columns have > 20% missing values

In [595]:
# Checking the % of the null values in every column of the calendar dataset
(calendar.isnull().sum()/len(calendar)*100).sort_values(ascending=False)
Out[595]:
price         32.938998
available      0.000000
date           0.000000
listing_id     0.000000
dtype: float64
In [596]:
# Checking the % of the null values in every column > 20%
# (listings.isnull().sum()/len(listings)*100).sort_values(ascending=False) >20

Dropping the columns with > 20% missing values for both calendar and listings datasets

In [597]:
# dropping the columns where missing values > 20%
listings = listings.drop(columns=['neighborhood_overview', 'cleaning_fee', 'transit', 'host_about', 'host_acceptance_rate', 'license', 'square_feet', 'monthly_price', 'security_deposit', 'weekly_price', 'notes'], axis =1)
In [598]:
# dropping the columns where missing values > 20%
calendar = calendar.drop(columns=['price'], axis =1)
In [599]:
# Checking the columns with missing values again in listings dataset
(listings.isnull().sum()/len(listings)*100).sort_values(ascending=False)
Out[599]:
review_scores_accuracy              17.234154
review_scores_checkin               17.234154
review_scores_value                 17.181771
review_scores_location              17.155579
review_scores_cleanliness           17.103195
review_scores_communication         17.050812
review_scores_rating                16.946045
first_review                        16.422211
last_review                         16.422211
reviews_per_month                   16.422211
space                               14.903091
host_response_rate                  13.698271
host_response_time                  13.698271
neighbourhood                       10.895757
thumbnail_url                        8.381351
medium_url                           8.381351
xl_picture_url                       8.381351
host_neighbourhood                   7.857517
summary                              4.635935
bathrooms                            0.419068
host_location                        0.209534
zipcode                              0.183342
bedrooms                             0.157150
host_since                           0.052383
host_thumbnail_url                   0.052383
host_is_superhost                    0.052383
host_picture_url                     0.052383
host_name                            0.052383
host_listings_count                  0.052383
host_total_listings_count            0.052383
host_has_profile_pic                 0.052383
host_identity_verified               0.052383
property_type                        0.026192
beds                                 0.026192
street                               0.000000
host_verifications                   0.000000
host_id                              0.000000
host_url                             0.000000
neighbourhood_group_cleansed         0.000000
picture_url                          0.000000
experiences_offered                  0.000000
description                          0.000000
name                                 0.000000
last_scraped                         0.000000
scrape_id                            0.000000
listing_url                          0.000000
neighbourhood_cleansed               0.000000
latitude                             0.000000
city                                 0.000000
maximum_nights                       0.000000
require_guest_phone_verification     0.000000
require_guest_profile_picture        0.000000
cancellation_policy                  0.000000
instant_bookable                     0.000000
jurisdiction_names                   0.000000
requires_license                     0.000000
number_of_reviews                    0.000000
calendar_last_scraped                0.000000
availability_365                     0.000000
availability_90                      0.000000
availability_60                      0.000000
availability_30                      0.000000
has_availability                     0.000000
calendar_updated                     0.000000
minimum_nights                       0.000000
state                                0.000000
extra_people                         0.000000
guests_included                      0.000000
price                                0.000000
amenities                            0.000000
bed_type                             0.000000
accommodates                         0.000000
room_type                            0.000000
is_location_exact                    0.000000
longitude                            0.000000
calculated_host_listings_count       0.000000
country                              0.000000
country_code                         0.000000
smart_location                       0.000000
market                               0.000000
id                                   0.000000
dtype: float64
In [600]:
# Checking the columns with missing values again in calendar dataset
(calendar.isnull().sum()/len(calendar)*100).sort_values(ascending=False)
Out[600]:
available     0.0
date          0.0
listing_id    0.0
dtype: float64

Checking the missing values

In [601]:
listings.isnull().sum().sum()
Out[601]:
9984
In [602]:
calendar.isnull().sum().sum()
Out[602]:
0

Replacing missing values with 0 in the listings and calendar datasets

In [603]:
listings.replace(np.nan, 0, inplace=True)
In [604]:
calendar.replace(np.nan, 0, inplace=True)

Checking the NaN values again

In [605]:
listings.isnull().sum()
Out[605]:
id                                  0
listing_url                         0
scrape_id                           0
last_scraped                        0
name                                0
summary                             0
space                               0
description                         0
experiences_offered                 0
thumbnail_url                       0
medium_url                          0
picture_url                         0
xl_picture_url                      0
host_id                             0
host_url                            0
host_name                           0
host_since                          0
host_location                       0
host_response_time                  0
host_response_rate                  0
host_is_superhost                   0
host_thumbnail_url                  0
host_picture_url                    0
host_neighbourhood                  0
host_listings_count                 0
host_total_listings_count           0
host_verifications                  0
host_has_profile_pic                0
host_identity_verified              0
street                              0
neighbourhood                       0
neighbourhood_cleansed              0
neighbourhood_group_cleansed        0
city                                0
state                               0
zipcode                             0
market                              0
smart_location                      0
country_code                        0
country                             0
latitude                            0
longitude                           0
is_location_exact                   0
property_type                       0
room_type                           0
accommodates                        0
bathrooms                           0
bedrooms                            0
beds                                0
bed_type                            0
amenities                           0
price                               0
guests_included                     0
extra_people                        0
minimum_nights                      0
maximum_nights                      0
calendar_updated                    0
has_availability                    0
availability_30                     0
availability_60                     0
availability_90                     0
availability_365                    0
calendar_last_scraped               0
number_of_reviews                   0
first_review                        0
last_review                         0
review_scores_rating                0
review_scores_accuracy              0
review_scores_cleanliness           0
review_scores_checkin               0
review_scores_communication         0
review_scores_location              0
review_scores_value                 0
requires_license                    0
jurisdiction_names                  0
instant_bookable                    0
cancellation_policy                 0
require_guest_profile_picture       0
require_guest_phone_verification    0
calculated_host_listings_count      0
reviews_per_month                   0
dtype: int64
In [606]:
calendar.isnull().sum()
Out[606]:
listing_id    0
date          0
available     0
dtype: int64
In [607]:
listings.head()
Out[607]:
id listing_url scrape_id last_scraped name summary space description experiences_offered thumbnail_url medium_url picture_url xl_picture_url host_id host_url host_name host_since host_location host_response_time host_response_rate host_is_superhost host_thumbnail_url host_picture_url host_neighbourhood host_listings_count host_total_listings_count host_verifications host_has_profile_pic host_identity_verified street neighbourhood neighbourhood_cleansed neighbourhood_group_cleansed city state zipcode market smart_location country_code country latitude longitude is_location_exact property_type room_type accommodates bathrooms bedrooms beds bed_type amenities price guests_included extra_people minimum_nights maximum_nights calendar_updated has_availability availability_30 availability_60 availability_90 availability_365 calendar_last_scraped number_of_reviews first_review last_review review_scores_rating review_scores_accuracy review_scores_cleanliness review_scores_checkin review_scores_communication review_scores_location review_scores_value requires_license jurisdiction_names instant_bookable cancellation_policy require_guest_profile_picture require_guest_phone_verification calculated_host_listings_count reviews_per_month
0 241032 https://www.airbnb.com/rooms/241032 20160104002432 2016-01-04 Stylish Queen Anne Apartment 0 Make your self at home in this charming one-be... Make your self at home in this charming one-be... none 0 0 https://a1.muscache.com/ac/pictures/67560560/c... 0 956883 https://www.airbnb.com/users/show/956883 Maija 2011-08-11 Seattle, Washington, United States within a few hours 96% f https://a0.muscache.com/ac/users/956883/profil... https://a0.muscache.com/ac/users/956883/profil... Queen Anne 3.0 3.0 ['email', 'phone', 'reviews', 'kba'] t t Gilman Dr W, Seattle, WA 98119, United States Queen Anne West Queen Anne Queen Anne Seattle WA 98119 Seattle Seattle, WA US United States 47.636289 -122.371025 t Apartment Entire home/apt 4 1.0 1.0 1.0 Real Bed {TV,"Cable TV",Internet,"Wireless Internet","A... $85.00 2 $5.00 1 365 4 weeks ago t 14 41 71 346 2016-01-04 207 2011-11-01 2016-01-02 95.0 10.0 10.0 10.0 10.0 9.0 10.0 f WASHINGTON f moderate f f 2 4.07
1 953595 https://www.airbnb.com/rooms/953595 20160104002432 2016-01-04 Bright & Airy Queen Anne Apartment Chemically sensitive? We've removed the irrita... Beautiful, hypoallergenic apartment in an extr... Chemically sensitive? We've removed the irrita... none https://a0.muscache.com/ac/pictures/14409893/f... https://a0.muscache.com/im/pictures/14409893/f... https://a0.muscache.com/ac/pictures/14409893/f... https://a0.muscache.com/ac/pictures/14409893/f... 5177328 https://www.airbnb.com/users/show/5177328 Andrea 2013-02-21 Seattle, Washington, United States within an hour 98% t https://a0.muscache.com/ac/users/5177328/profi... https://a0.muscache.com/ac/users/5177328/profi... Queen Anne 6.0 6.0 ['email', 'phone', 'facebook', 'linkedin', 're... t t 7th Avenue West, Seattle, WA 98119, United States Queen Anne West Queen Anne Queen Anne Seattle WA 98119 Seattle Seattle, WA US United States 47.639123 -122.365666 t Apartment Entire home/apt 4 1.0 1.0 1.0 Real Bed {TV,Internet,"Wireless Internet",Kitchen,"Free... $150.00 1 $0.00 2 90 today t 13 13 16 291 2016-01-04 43 2013-08-19 2015-12-29 96.0 10.0 10.0 10.0 10.0 10.0 10.0 f WASHINGTON f strict t t 6 1.48
2 3308979 https://www.airbnb.com/rooms/3308979 20160104002432 2016-01-04 New Modern House-Amazing water view New modern house built in 2013. Spectacular s... Our house is modern, light and fresh with a wa... New modern house built in 2013. Spectacular s... none 0 0 https://a2.muscache.com/ac/pictures/b4324e0f-a... 0 16708587 https://www.airbnb.com/users/show/16708587 Jill 2014-06-12 Seattle, Washington, United States within a few hours 67% f https://a1.muscache.com/ac/users/16708587/prof... https://a1.muscache.com/ac/users/16708587/prof... Queen Anne 2.0 2.0 ['email', 'phone', 'google', 'reviews', 'jumio'] t t West Lee Street, Seattle, WA 98119, United States Queen Anne West Queen Anne Queen Anne Seattle WA 98119 Seattle Seattle, WA US United States 47.629724 -122.369483 t House Entire home/apt 11 4.5 5.0 7.0 Real Bed {TV,"Cable TV",Internet,"Wireless Internet","A... $975.00 10 $25.00 4 30 5 weeks ago t 1 6 17 220 2016-01-04 20 2014-07-30 2015-09-03 97.0 10.0 10.0 10.0 10.0 10.0 10.0 f WASHINGTON f strict f f 2 1.15
3 7421966 https://www.airbnb.com/rooms/7421966 20160104002432 2016-01-04 Queen Anne Chateau A charming apartment that sits atop Queen Anne... 0 A charming apartment that sits atop Queen Anne... none 0 0 https://a0.muscache.com/ac/pictures/94146944/6... 0 9851441 https://www.airbnb.com/users/show/9851441 Emily 2013-11-06 Seattle, Washington, United States 0 0 f https://a2.muscache.com/ac/users/9851441/profi... https://a2.muscache.com/ac/users/9851441/profi... Queen Anne 1.0 1.0 ['email', 'phone', 'facebook', 'reviews', 'jum... t t 8th Avenue West, Seattle, WA 98119, United States Queen Anne West Queen Anne Queen Anne Seattle WA 98119 Seattle Seattle, WA US United States 47.638473 -122.369279 t Apartment Entire home/apt 3 1.0 0.0 2.0 Real Bed {Internet,"Wireless Internet",Kitchen,"Indoor ... $100.00 1 $0.00 1 1125 6 months ago t 0 0 0 143 2016-01-04 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 f WASHINGTON f flexible f f 1 0.00
4 278830 https://www.airbnb.com/rooms/278830 20160104002432 2016-01-04 Charming craftsman 3 bdm house Cozy family craftman house in beautiful neighb... Cozy family craftman house in beautiful neighb... Cozy family craftman house in beautiful neighb... none 0 0 https://a1.muscache.com/ac/pictures/6120468/b0... 0 1452570 https://www.airbnb.com/users/show/1452570 Emily 2011-11-29 Seattle, Washington, United States within an hour 100% f https://a0.muscache.com/ac/users/1452570/profi... https://a0.muscache.com/ac/users/1452570/profi... Queen Anne 2.0 2.0 ['email', 'phone', 'facebook', 'reviews', 'kba'] t t 14th Ave W, Seattle, WA 98119, United States Queen Anne West Queen Anne Queen Anne Seattle WA 98119 Seattle Seattle, WA US United States 47.632918 -122.372471 t House Entire home/apt 6 2.0 3.0 3.0 Real Bed {TV,"Cable TV",Internet,"Wireless Internet",Ki... $450.00 6 $15.00 1 1125 7 weeks ago t 30 60 90 365 2016-01-04 38 2012-07-10 2015-10-24 92.0 9.0 9.0 10.0 10.0 9.0 9.0 f WASHINGTON f strict f f 1 0.89

Dropping unessary or repliaceted columns

In [608]:
# dropping the uncessary and replicated columns.  
listings = listings.drop(columns=['host_name', 'calendar_updated', 'host_since', 'host_response_time', 'host_listings_count', 'neighbourhood_group_cleansed', 'zipcode', 'latitude', 'longitude', 'is_location_exact', 'has_availability', 'requires_license', 'country', 'city', 'state', 'street', 'market', 'smart_location', 'country_code', 'summary', 'description', 'space',  'jurisdiction_names', 'scrape_id', 'last_scraped', 'host_picture_url', 'host_verifications',  'listing_url', 'experiences_offered', 'thumbnail_url',  'medium_url', 'picture_url', 'host_url', 'host_thumbnail_url', 'xl_picture_url', 'calendar_last_scraped', 'neighbourhood', 'host_neighbourhood', 'first_review', 'last_review', 'name', 'host_location'], axis =1)
In [609]:
listings.head()
Out[609]:
id host_id host_response_rate host_is_superhost host_total_listings_count host_has_profile_pic host_identity_verified neighbourhood_cleansed property_type room_type accommodates bathrooms bedrooms beds bed_type amenities price guests_included extra_people minimum_nights maximum_nights availability_30 availability_60 availability_90 availability_365 number_of_reviews review_scores_rating review_scores_accuracy review_scores_cleanliness review_scores_checkin review_scores_communication review_scores_location review_scores_value instant_bookable cancellation_policy require_guest_profile_picture require_guest_phone_verification calculated_host_listings_count reviews_per_month
0 241032 956883 96% f 3.0 t t West Queen Anne Apartment Entire home/apt 4 1.0 1.0 1.0 Real Bed {TV,"Cable TV",Internet,"Wireless Internet","A... $85.00 2 $5.00 1 365 14 41 71 346 207 95.0 10.0 10.0 10.0 10.0 9.0 10.0 f moderate f f 2 4.07
1 953595 5177328 98% t 6.0 t t West Queen Anne Apartment Entire home/apt 4 1.0 1.0 1.0 Real Bed {TV,Internet,"Wireless Internet",Kitchen,"Free... $150.00 1 $0.00 2 90 13 13 16 291 43 96.0 10.0 10.0 10.0 10.0 10.0 10.0 f strict t t 6 1.48
2 3308979 16708587 67% f 2.0 t t West Queen Anne House Entire home/apt 11 4.5 5.0 7.0 Real Bed {TV,"Cable TV",Internet,"Wireless Internet","A... $975.00 10 $25.00 4 30 1 6 17 220 20 97.0 10.0 10.0 10.0 10.0 10.0 10.0 f strict f f 2 1.15
3 7421966 9851441 0 f 1.0 t t West Queen Anne Apartment Entire home/apt 3 1.0 0.0 2.0 Real Bed {Internet,"Wireless Internet",Kitchen,"Indoor ... $100.00 1 $0.00 1 1125 0 0 0 143 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 f flexible f f 1 0.00
4 278830 1452570 100% f 2.0 t t West Queen Anne House Entire home/apt 6 2.0 3.0 3.0 Real Bed {TV,"Cable TV",Internet,"Wireless Internet",Ki... $450.00 6 $15.00 1 1125 30 60 90 365 38 92.0 9.0 9.0 10.0 10.0 9.0 9.0 f strict f f 1 0.89
In [610]:
listings.info();
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3818 entries, 0 to 3817
Data columns (total 39 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   id                                3818 non-null   int64  
 1   host_id                           3818 non-null   int64  
 2   host_response_rate                3818 non-null   object 
 3   host_is_superhost                 3818 non-null   object 
 4   host_total_listings_count         3818 non-null   float64
 5   host_has_profile_pic              3818 non-null   object 
 6   host_identity_verified            3818 non-null   object 
 7   neighbourhood_cleansed            3818 non-null   object 
 8   property_type                     3818 non-null   object 
 9   room_type                         3818 non-null   object 
 10  accommodates                      3818 non-null   int64  
 11  bathrooms                         3818 non-null   float64
 12  bedrooms                          3818 non-null   float64
 13  beds                              3818 non-null   float64
 14  bed_type                          3818 non-null   object 
 15  amenities                         3818 non-null   object 
 16  price                             3818 non-null   object 
 17  guests_included                   3818 non-null   int64  
 18  extra_people                      3818 non-null   object 
 19  minimum_nights                    3818 non-null   int64  
 20  maximum_nights                    3818 non-null   int64  
 21  availability_30                   3818 non-null   int64  
 22  availability_60                   3818 non-null   int64  
 23  availability_90                   3818 non-null   int64  
 24  availability_365                  3818 non-null   int64  
 25  number_of_reviews                 3818 non-null   int64  
 26  review_scores_rating              3818 non-null   float64
 27  review_scores_accuracy            3818 non-null   float64
 28  review_scores_cleanliness         3818 non-null   float64
 29  review_scores_checkin             3818 non-null   float64
 30  review_scores_communication       3818 non-null   float64
 31  review_scores_location            3818 non-null   float64
 32  review_scores_value               3818 non-null   float64
 33  instant_bookable                  3818 non-null   object 
 34  cancellation_policy               3818 non-null   object 
 35  require_guest_profile_picture     3818 non-null   object 
 36  require_guest_phone_verification  3818 non-null   object 
 37  calculated_host_listings_count    3818 non-null   int64  
 38  reviews_per_month                 3818 non-null   float64
dtypes: float64(12), int64(12), object(15)
memory usage: 1.1+ MB

We need to to clean the prices columns by taking out the $ symbol and converting the column from object to float

In [611]:
listings['price'] = listings.price.str.strip('$')
In [612]:
listings['extra_people'] = listings.extra_people.str.strip('$')
In [613]:
listings['price'] = pd.to_numeric(listings['price'],errors='coerce')
In [614]:
listings['extra_people'] = pd.to_numeric(listings['extra_people'],errors='coerce')

converting host_since from object to datetime datatype

In [615]:
calendar['date'] = pd.to_datetime(calendar['date'])

We need to to clean the columns with % by taking out the % symbol and converting the column from object to float

In [616]:
listings['host_response_rate'] = listings.host_response_rate.str.strip('%')
In [617]:
listings['host_response_rate'] = pd.to_numeric(listings['host_response_rate'],errors='coerce')
In [618]:
listings.dtypes
Out[618]:
id                                    int64
host_id                               int64
host_response_rate                  float64
host_is_superhost                    object
host_total_listings_count           float64
host_has_profile_pic                 object
host_identity_verified               object
neighbourhood_cleansed               object
property_type                        object
room_type                            object
accommodates                          int64
bathrooms                           float64
bedrooms                            float64
beds                                float64
bed_type                             object
amenities                            object
price                               float64
guests_included                       int64
extra_people                        float64
minimum_nights                        int64
maximum_nights                        int64
availability_30                       int64
availability_60                       int64
availability_90                       int64
availability_365                      int64
number_of_reviews                     int64
review_scores_rating                float64
review_scores_accuracy              float64
review_scores_cleanliness           float64
review_scores_checkin               float64
review_scores_communication         float64
review_scores_location              float64
review_scores_value                 float64
instant_bookable                     object
cancellation_policy                  object
require_guest_profile_picture        object
require_guest_phone_verification     object
calculated_host_listings_count        int64
reviews_per_month                   float64
dtype: object

Converting the categorial varilables to numeric

In [619]:
listings_num = listings.copy()
In [620]:
listings_num['host_identity_verified'] = listings_num['host_identity_verified'].astype('category').cat.codes
In [621]:
listings_num['host_is_superhost'] = listings_num['host_is_superhost'].astype('category').cat.codes
In [622]:
listings_num['host_has_profile_pic'] = listings_num['host_has_profile_pic'].astype('category').cat.codes
In [623]:
listings_num['property_type'] = listings_num['property_type'].astype('category').cat.codes
In [624]:
listings_num['room_type'] = listings_num['room_type'].astype('category').cat.codes
In [625]:
listings_num['bed_type'] = listings_num['bed_type'].astype('category').cat.codes
In [626]:
listings_num['amenities'] = listings_num['amenities'].astype('category').cat.codes
In [627]:
listings_num['instant_bookable'] = listings_num['instant_bookable'].astype('category').cat.codes
In [628]:
listings_num['cancellation_policy'] = listings_num['cancellation_policy'].astype('category').cat.codes
In [629]:
listings_num['require_guest_profile_picture'] = listings_num['require_guest_profile_picture'].astype('category').cat.codes
In [630]:
listings_num['require_guest_phone_verification'] = listings_num['require_guest_phone_verification'].astype('category').cat.codes
In [631]:
listings_num['id'] = listings_num['id'].astype('category').cat.codes
In [632]:
listings_num['host_id'] = listings_num['host_id'].astype('category').cat.codes
In [664]:
listings_num['neighbourhood_cleansed'] = listings_num['neighbourhood_cleansed'].astype('category').cat.codes
In [665]:
listings_num.head()
Out[665]:
id host_id host_response_rate host_is_superhost host_total_listings_count host_has_profile_pic host_identity_verified neighbourhood_cleansed property_type room_type accommodates bathrooms bedrooms beds bed_type amenities price guests_included extra_people minimum_nights maximum_nights availability_30 availability_60 availability_90 availability_365 number_of_reviews review_scores_rating review_scores_accuracy review_scores_cleanliness review_scores_checkin review_scores_communication review_scores_location review_scores_value instant_bookable cancellation_policy require_guest_profile_picture require_guest_phone_verification calculated_host_listings_count reviews_per_month
0 127 229 96.0 1 3.0 2 2 81 1 0 4 1.0 1.0 1.0 4 1467 85.0 2 5.0 1 365 14 41 71 346 207 95.0 10.0 10.0 10.0 10.0 9.0 10.0 0 1 0 0 2 4.07
1 374 753 98.0 2 6.0 2 2 81 1 0 4 1.0 1.0 1.0 4 2729 150.0 1 0.0 2 90 13 13 16 291 43 96.0 10.0 10.0 10.0 10.0 10.0 10.0 0 2 1 1 6 1.48
2 977 1534 67.0 1 2.0 2 2 81 10 0 11 4.5 5.0 7.0 4 1350 975.0 10 25.0 4 30 1 6 17 220 20 97.0 10.0 10.0 10.0 10.0 10.0 10.0 0 2 0 0 2 1.15
3 2512 1162 NaN 1 1.0 2 2 81 1 0 3 1.0 0.0 2.0 4 864 100.0 1 0.0 1 1125 0 0 0 143 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 1 0.00
4 142 319 100.0 1 2.0 2 2 81 10 0 6 2.0 3.0 3.0 4 2060 450.0 6 15.0 1 1125 30 60 90 365 38 92.0 9.0 9.0 10.0 10.0 9.0 9.0 0 2 0 0 1 0.89
In [ ]:
 

Finding out if there is host name duplicates

5. Data Modeling:

In [634]:
# listings_dummy.head()
In [635]:
# listings_dummy = pd.get_dummies(listings, drop_first=True)
In [636]:
# dups_shape = listings.pivot_table(index=['host_id'], aggfunc='size')
In [637]:
# dups_shape.index
In [666]:
listings_num.corr()
Out[666]:
id host_id host_response_rate host_is_superhost host_total_listings_count host_has_profile_pic host_identity_verified neighbourhood_cleansed property_type room_type accommodates bathrooms bedrooms beds bed_type amenities price guests_included extra_people minimum_nights maximum_nights availability_30 availability_60 availability_90 availability_365 number_of_reviews review_scores_rating review_scores_accuracy review_scores_cleanliness review_scores_checkin review_scores_communication review_scores_location review_scores_value instant_bookable cancellation_policy require_guest_profile_picture require_guest_phone_verification calculated_host_listings_count reviews_per_month
id 1.000000 0.526779 -0.023486 -0.179286 -0.017558 -0.031746 -0.165083 -0.000194 -0.044046 0.012270 -0.072106 -0.045030 -0.057565 -0.048582 -0.015800 0.089404 -0.051899 -0.099935 -0.088781 -0.023612 0.096889 -0.063821 -0.072278 -0.084045 -0.159727 -0.491456 -0.419528 -0.422457 -0.413166 -0.431296 -0.427130 -0.415672 -0.413519 0.042352 -0.277061 -0.215819 -0.256966 -0.054340 -0.119738
host_id 0.526779 1.000000 -0.017555 -0.091425 -0.074499 -0.021404 -0.232388 0.007764 -0.043996 0.036661 -0.107626 -0.043565 -0.079650 -0.075468 -0.037162 0.062870 -0.050778 -0.082703 -0.033159 -0.022948 0.061492 -0.040036 -0.044780 -0.047325 -0.089550 -0.264044 -0.170898 -0.171969 -0.166864 -0.176127 -0.172379 -0.166775 -0.163151 0.082946 -0.245062 -0.299143 -0.334560 -0.212590 0.027363
host_response_rate -0.023486 -0.017555 1.000000 0.153810 0.022147 -0.000372 0.090387 -0.031925 0.044479 -0.010282 0.005433 0.005279 -0.001040 0.015617 0.037637 0.007689 -0.014071 0.065056 0.019094 0.006604 -0.082118 -0.047150 -0.035688 -0.031227 -0.087838 0.104779 0.097171 0.090600 0.102826 0.090744 0.093257 0.080213 0.096658 0.094017 0.092686 0.015841 0.002301 -0.087481 0.168230
host_is_superhost -0.179286 -0.091425 0.153810 1.000000 -0.039775 0.054250 0.151422 -0.005766 0.033836 -0.015971 -0.002012 0.001362 -0.026557 -0.022382 0.028015 0.032379 0.012868 0.060904 0.041727 -0.005713 -0.037321 0.013424 0.031909 0.042953 -0.012394 0.262108 0.220262 0.219492 0.225617 0.207942 0.204925 0.202671 0.222906 0.082293 0.110963 0.099962 0.104161 -0.092745 0.311759
host_total_listings_count -0.017558 -0.074499 0.022147 -0.039775 1.000000 0.010480 0.086930 -0.101094 -0.111521 -0.057998 0.111284 0.066845 0.062644 0.085609 0.031055 -0.023497 0.095591 -0.059217 -0.063490 0.001901 0.022699 0.119754 0.124721 0.124040 0.086184 -0.062136 -0.031702 -0.032574 -0.022896 -0.037898 -0.031785 -0.012787 -0.033066 -0.044281 0.206622 0.103811 0.090513 0.224256 -0.093774
host_has_profile_pic -0.031746 -0.021404 -0.000372 0.054250 0.010480 1.000000 0.128073 0.014681 0.026393 0.000125 0.012362 0.009117 0.020640 0.014993 -0.008678 -0.014043 -0.021250 0.014036 -0.008958 0.001299 0.001401 -0.021428 -0.015065 -0.013153 0.029354 0.023114 0.043570 0.044987 0.043464 0.046023 0.045485 0.043987 0.042760 -0.003438 0.014179 0.013941 0.015186 0.015199 0.030048
host_identity_verified -0.165083 -0.232388 0.090387 0.151422 0.086930 0.128073 1.000000 -0.029047 0.008028 -0.015848 0.041925 -0.009932 -0.005346 0.014965 0.021763 0.022243 0.002895 0.033691 0.046204 -0.027499 -0.012315 0.022072 0.031827 0.044146 0.024235 0.098041 0.199805 0.197649 0.200272 0.195021 0.195122 0.192436 0.192678 -0.001063 0.138001 0.112566 0.131929 0.059250 0.134237
neighbourhood_cleansed -0.000194 0.007764 -0.031925 -0.005766 -0.101094 0.014681 -0.029047 1.000000 0.193953 0.112103 -0.020001 0.038187 0.063422 0.020385 -0.029527 -0.065899 -0.035083 0.056994 0.028576 -0.016271 -0.027223 0.023969 0.018351 0.018572 0.048532 0.001414 -0.000099 0.002624 -0.004881 -0.001239 -0.001074 -0.009411 0.006711 -0.027020 -0.056267 -0.094926 -0.094076 0.061097 -0.016853
property_type -0.044046 -0.043996 0.044479 0.033836 -0.111521 0.026393 0.008028 0.193953 1.000000 0.274013 0.119473 0.240772 0.285680 0.163768 0.036114 -0.071746 0.043210 0.145762 0.053217 -0.017254 -0.031952 0.025113 0.027885 0.029139 0.037427 0.009201 -0.002173 -0.006709 -0.008000 -0.007489 -0.009368 -0.029319 0.004783 -0.057887 -0.033126 -0.062758 -0.056390 0.000627 -0.021588
room_type 0.012270 0.036661 -0.010282 -0.015971 -0.057998 0.000125 -0.015848 0.112103 0.274013 1.000000 -0.460971 -0.099129 -0.233161 -0.344106 -0.208938 -0.140851 -0.433745 -0.250081 -0.083386 -0.025187 -0.005987 0.152464 0.163441 0.166518 0.133112 0.037427 -0.024999 -0.033530 -0.042728 -0.030754 -0.028045 -0.036787 -0.019521 -0.071934 -0.206957 -0.045679 -0.060012 0.165843 0.032835
accommodates -0.072106 -0.107626 0.005433 -0.002012 0.111284 0.012362 0.041925 -0.020001 0.119473 -0.460971 1.000000 0.533586 0.769680 0.860714 0.130372 0.101470 0.659512 0.532796 0.148390 0.017097 0.003291 -0.043169 -0.048761 -0.060468 -0.031535 -0.072978 0.036648 0.034294 0.042512 0.039341 0.038589 0.035531 0.029379 0.024355 0.282552 0.060069 0.064525 -0.029525 -0.101904
bathrooms -0.045030 -0.043565 0.005279 0.001362 0.066845 0.009117 -0.009932 0.038187 0.240772 -0.099129 0.533586 1.000000 0.605989 0.528864 0.068096 0.041466 0.519623 0.304501 0.080811 0.006567 -0.013941 -0.044530 -0.054642 -0.062456 -0.006181 -0.094550 0.007483 0.000434 0.004769 0.001875 -0.000289 0.000106 0.004277 -0.048376 0.137011 0.020483 0.010624 -0.004103 -0.134198
bedrooms -0.057565 -0.079650 -0.001040 -0.026557 0.062644 0.020640 -0.005346 0.063422 0.285680 -0.233161 0.769680 0.605989 1.000000 0.752720 0.080169 0.070969 0.632741 0.456755 0.109494 0.012084 -0.008061 -0.078166 -0.091714 -0.104119 -0.051314 -0.106732 -0.010024 -0.014239 -0.013390 -0.011083 -0.012474 -0.017355 -0.013831 -0.070145 0.193780 0.009484 0.008340 -0.045418 -0.191424
beds -0.048582 -0.075468 0.015617 -0.022382 0.085609 0.014993 0.014965 0.020385 0.163768 -0.344106 0.860714 0.528864 0.752720 1.000000 0.104693 0.046418 0.595196 0.460561 0.131039 0.002703 -0.009192 -0.028991 -0.036814 -0.047924 -0.010147 -0.088811 0.014087 0.008172 0.016969 0.013675 0.014291 0.012470 0.010581 0.024561 0.239074 0.036514 0.045101 0.010597 -0.119031
bed_type -0.015800 -0.037162 0.037637 0.028015 0.031055 -0.008678 0.021763 -0.029527 0.036114 -0.208938 0.130372 0.068096 0.080169 0.104693 1.000000 0.019528 0.116214 0.051333 0.004903 0.008635 -0.000616 -0.001771 -0.000551 0.003371 0.007070 0.020279 0.034034 0.034728 0.036399 0.033097 0.031358 0.034804 0.032086 0.044118 0.122112 0.033292 0.040005 0.048443 0.024768
amenities 0.089404 0.062870 0.007689 0.032379 -0.023497 -0.014043 0.022243 -0.065899 -0.071746 -0.140851 0.101470 0.041466 0.070969 0.046418 0.019528 1.000000 0.078647 0.081836 0.043266 -0.013900 0.029876 -0.059684 -0.062954 -0.064805 -0.082438 -0.037006 -0.029342 -0.029243 -0.029012 -0.030630 -0.031560 -0.026889 -0.030823 0.019906 0.001967 -0.022802 -0.025079 -0.134355 0.017130
price -0.051899 -0.050778 -0.014071 0.012868 0.095591 -0.021250 0.002895 -0.035083 0.043210 -0.433745 0.659512 0.519623 0.632741 0.595196 0.116214 0.078647 1.000000 0.399100 0.131117 0.017728 -0.003902 -0.039432 -0.051732 -0.061422 -0.018035 -0.124812 -0.021718 -0.023546 -0.018653 -0.025696 -0.026680 -0.014504 -0.034318 -0.031011 0.215375 0.064863 0.059163 -0.053920 -0.190477
guests_included -0.099935 -0.082703 0.065056 0.060904 -0.059217 0.014036 0.033691 0.056994 0.145762 -0.250081 0.532796 0.304501 0.456755 0.460561 0.051333 0.081836 0.399100 1.000000 0.422452 -0.001659 -0.018637 -0.045357 -0.046944 -0.047100 -0.048922 0.028114 0.059208 0.060765 0.063163 0.065148 0.061907 0.056200 0.056260 0.015738 0.216053 0.003668 0.013275 -0.077759 0.003972
extra_people -0.088781 -0.033159 0.019094 0.041727 -0.063490 -0.008958 0.046204 0.028576 0.053217 -0.083386 0.148390 0.080811 0.109494 0.131039 0.004903 0.043266 0.131117 0.422452 1.000000 -0.010692 -0.018509 0.026570 0.021808 0.026427 0.012101 0.044395 0.078921 0.078441 0.084772 0.074547 0.075735 0.083744 0.075668 0.022009 0.142026 0.024782 0.021103 -0.043115 0.040026
minimum_nights -0.023612 -0.022948 0.006604 -0.005713 0.001901 0.001299 -0.027499 -0.016271 -0.017254 -0.025187 0.017097 0.006567 0.012084 0.002703 0.008635 -0.013900 0.017728 -0.001659 -0.010692 1.000000 0.003161 0.013205 0.010290 0.009076 0.009087 -0.013818 0.003776 0.001580 -0.002556 0.000795 0.004618 0.006118 0.002183 -0.014624 0.017332 0.004831 0.001485 0.000512 -0.029727
maximum_nights 0.096889 0.061492 -0.082118 -0.037321 0.022699 0.001401 -0.012315 -0.027223 -0.031952 -0.005987 0.003291 -0.013941 -0.008061 -0.009192 -0.000616 0.029876 -0.003902 -0.018637 -0.018509 0.003161 1.000000 0.012254 -0.009236 -0.002945 0.007265 -0.081578 -0.039596 -0.035457 -0.036216 -0.043953 -0.039785 -0.037799 -0.038967 -0.013534 -0.015909 -0.028365 -0.035810 0.031274 -0.043766
availability_30 -0.063821 -0.040036 -0.047150 0.013424 0.119754 -0.021428 0.022072 0.023969 0.025113 0.152464 -0.043169 -0.044530 -0.078166 -0.028991 -0.001771 -0.059684 -0.039432 -0.045357 0.026570 0.013205 0.012254 1.000000 0.936122 0.875778 0.503881 0.074611 0.024553 0.024417 0.034499 0.028055 0.029290 0.028067 0.019250 0.041661 0.063477 0.070533 0.069565 0.124677 0.057327
availability_60 -0.072278 -0.044780 -0.035688 0.031909 0.124721 -0.015065 0.031827 0.018351 0.027885 0.163441 -0.048761 -0.054642 -0.091714 -0.036814 -0.000551 -0.062954 -0.051732 -0.046944 0.021808 0.010290 -0.009236 0.936122 1.000000 0.973353 0.572857 0.099099 0.050191 0.051953 0.061069 0.055348 0.054545 0.052917 0.046599 0.065150 0.076134 0.070722 0.072845 0.130154 0.111184
availability_90 -0.084045 -0.047325 -0.031227 0.042953 0.124040 -0.013153 0.044146 0.018572 0.029139 0.166518 -0.060468 -0.062456 -0.104119 -0.047924 0.003371 -0.064805 -0.061422 -0.047100 0.026427 0.009076 -0.002945 0.875778 0.973353 1.000000 0.619355 0.105257 0.070979 0.073816 0.081156 0.075963 0.073861 0.072132 0.066886 0.073906 0.084865 0.066495 0.072351 0.132391 0.131875
availability_365 -0.159727 -0.089550 -0.087838 -0.012394 0.086184 0.029354 0.024235 0.048532 0.037427 0.133112 -0.031535 -0.006181 -0.051314 -0.010147 0.007070 -0.082438 -0.018035 -0.048922 0.012101 0.009087 0.007265 0.503881 0.572857 0.619355 1.000000 0.094273 0.069420 0.072251 0.070895 0.075378 0.072752 0.072679 0.066784 0.003744 0.047364 0.015704 0.037126 0.136881 0.034873
number_of_reviews -0.491456 -0.264044 0.104779 0.262108 -0.062136 0.023114 0.098041 0.001414 0.009201 0.037427 -0.072978 -0.094550 -0.106732 -0.088811 0.020279 -0.037006 -0.124812 0.028114 0.044395 -0.013818 -0.081578 0.074611 0.099099 0.105257 0.094273 1.000000 0.267817 0.275221 0.272397 0.277842 0.273798 0.262834 0.270940 0.123071 0.118048 0.186719 0.213455 -0.067194 0.601509
review_scores_rating -0.419528 -0.170898 0.097171 0.220262 -0.031702 0.043570 0.199805 -0.000099 -0.002173 -0.024999 0.036648 0.007483 -0.010024 0.014087 0.034034 -0.029342 -0.021718 0.059208 0.078921 0.003776 -0.039596 0.024553 0.050191 0.070979 0.069420 0.267817 1.000000 0.979406 0.982442 0.978234 0.985243 0.976161 0.981976 0.094494 0.209938 0.069252 0.084511 -0.017523 0.432112
review_scores_accuracy -0.422457 -0.171969 0.090600 0.219492 -0.032574 0.044987 0.197649 0.002624 -0.006709 -0.033530 0.034294 0.000434 -0.014239 0.008172 0.034728 -0.029243 -0.023546 0.060765 0.078441 0.001580 -0.035457 0.024417 0.051953 0.073816 0.072251 0.275221 0.979406 1.000000 0.979645 0.983324 0.980084 0.977689 0.982167 0.097687 0.211261 0.066378 0.081978 -0.009938 0.442096
review_scores_cleanliness -0.413166 -0.166864 0.102826 0.225617 -0.022896 0.043464 0.200272 -0.004881 -0.008000 -0.042728 0.042512 0.004769 -0.013390 0.016969 0.036399 -0.029012 -0.018653 0.063163 0.084772 -0.002556 -0.036216 0.034499 0.061069 0.081156 0.070895 0.272397 0.982442 0.979645 1.000000 0.976824 0.979553 0.971349 0.976158 0.103391 0.220859 0.075094 0.088532 -0.013417 0.441598
review_scores_checkin -0.431296 -0.176127 0.090744 0.207942 -0.037898 0.046023 0.195021 -0.001239 -0.007489 -0.030754 0.039341 0.001875 -0.011083 0.013675 0.033097 -0.030630 -0.025696 0.065148 0.074547 0.000795 -0.043953 0.028055 0.055348 0.075963 0.075378 0.277842 0.978234 0.983324 0.976824 1.000000 0.988187 0.979843 0.979225 0.097087 0.207792 0.070967 0.085630 -0.009513 0.437739
review_scores_communication -0.427130 -0.172379 0.093257 0.204925 -0.031785 0.045485 0.195122 -0.001074 -0.009368 -0.028045 0.038589 -0.000289 -0.012474 0.014291 0.031358 -0.031560 -0.026680 0.061907 0.075735 0.004618 -0.039785 0.029290 0.054545 0.073861 0.072752 0.273798 0.985243 0.980084 0.979553 0.988187 1.000000 0.981594 0.980996 0.096809 0.210278 0.072468 0.087086 -0.007803 0.434476
review_scores_location -0.415672 -0.166775 0.080213 0.202671 -0.012787 0.043987 0.192436 -0.009411 -0.029319 -0.036787 0.035531 0.000106 -0.017355 0.012470 0.034804 -0.026889 -0.014504 0.056200 0.083744 0.006118 -0.037799 0.028067 0.052917 0.072132 0.072679 0.262834 0.976161 0.977689 0.971349 0.979843 0.981594 1.000000 0.979965 0.095636 0.213780 0.065805 0.081308 -0.002813 0.428736
review_scores_value -0.413519 -0.163151 0.096658 0.222906 -0.033066 0.042760 0.192678 0.006711 0.004783 -0.019521 0.029379 0.004277 -0.013831 0.010581 0.032086 -0.030823 -0.034318 0.056260 0.075668 0.002183 -0.038967 0.019250 0.046599 0.066886 0.066784 0.270940 0.981976 0.982167 0.976158 0.979225 0.980996 0.979965 1.000000 0.100836 0.205219 0.061137 0.075739 -0.017843 0.439302
instant_bookable 0.042352 0.082946 0.094017 0.082293 -0.044281 -0.003438 -0.001063 -0.027020 -0.057887 -0.071934 0.024355 -0.048376 -0.070145 0.024561 0.044118 0.019906 -0.031011 0.015738 0.022009 -0.014624 -0.013534 0.041661 0.065150 0.073906 0.003744 0.123071 0.094494 0.097687 0.103391 0.097087 0.096809 0.095636 0.100836 1.000000 0.051115 -0.014844 -0.012281 0.006972 0.272128
cancellation_policy -0.277061 -0.245062 0.092686 0.110963 0.206622 0.014179 0.138001 -0.056267 -0.033126 -0.206957 0.282552 0.137011 0.193780 0.239074 0.122112 0.001967 0.215375 0.216053 0.142026 0.017332 -0.015909 0.063477 0.076134 0.084865 0.047364 0.118048 0.209938 0.211261 0.220859 0.207792 0.210278 0.213780 0.205219 0.051115 1.000000 0.210900 0.220589 0.215044 0.071186
require_guest_profile_picture -0.215819 -0.299143 0.015841 0.099962 0.103811 0.013941 0.112566 -0.094926 -0.062758 -0.045679 0.060069 0.020483 0.009484 0.036514 0.033292 -0.022802 0.064863 0.003668 0.024782 0.004831 -0.028365 0.070533 0.070722 0.066495 0.015704 0.186719 0.069252 0.066378 0.075094 0.070967 0.072468 0.065805 0.061137 -0.014844 0.210900 1.000000 0.873632 0.203791 0.015392
require_guest_phone_verification -0.256966 -0.334560 0.002301 0.104161 0.090513 0.015186 0.131929 -0.094076 -0.056390 -0.060012 0.064525 0.010624 0.008340 0.045101 0.040005 -0.025079 0.059163 0.013275 0.021103 0.001485 -0.035810 0.069565 0.072845 0.072351 0.037126 0.213455 0.084511 0.081978 0.088532 0.085630 0.087086 0.081308 0.075739 -0.012281 0.220589 0.873632 1.000000 0.180596 0.011208
calculated_host_listings_count -0.054340 -0.212590 -0.087481 -0.092745 0.224256 0.015199 0.059250 0.061097 0.000627 0.165843 -0.029525 -0.004103 -0.045418 0.010597 0.048443 -0.134355 -0.053920 -0.077759 -0.043115 0.000512 0.031274 0.124677 0.130154 0.132391 0.136881 -0.067194 -0.017523 -0.009938 -0.013417 -0.009513 -0.007803 -0.002813 -0.017843 0.006972 0.215044 0.203791 0.180596 1.000000 -0.079649
reviews_per_month -0.119738 0.027363 0.168230 0.311759 -0.093774 0.030048 0.134237 -0.016853 -0.021588 0.032835 -0.101904 -0.134198 -0.191424 -0.119031 0.024768 0.017130 -0.190477 0.003972 0.040026 -0.029727 -0.043766 0.057327 0.111184 0.131875 0.034873 0.601509 0.432112 0.442096 0.441598 0.437739 0.434476 0.428736 0.439302 0.272128 0.071186 0.015392 0.011208 -0.079649 1.000000
In [689]:
tophost=listings.host_id.value_counts().head(15)
tophost
Out[689]:
8534462     46
4962900     39
1243056     37
430709      36
3074414     34
74305       33
26967583    21
7354306     18
42537846    16
1623580     12
658155      12
2911360     11
862329      11
754810      10
31148752    10
Name: host_id, dtype: int64
In [690]:
p1=tophost.plot(kind='bar')
p1.set_title('Hosts with the most listings')
p1.set_ylabel('Count of listings')
p1.set_xlabel('Host IDs')
p1.set_xticklabels(p1.get_xticklabels(), rotation=90);
In [670]:
topprice=listings.price.max()
topprice
Out[670]:
999.0
In [671]:
top_neighbourhood_cleansed=listings.neighbourhood_cleansed.value_counts().head(15)
top_neighbourhood_cleansed
Out[671]:
Broadway                     397
Belltown                     234
Wallingford                  167
Fremont                      158
Minor                        135
University District          122
Stevens                      119
First Hill                   108
Central Business District    103
Lower Queen Anne              94
Greenwood                     89
East Queen Anne               82
North Beacon Hill             78
Phinney Ridge                 73
Adams                         70
Name: neighbourhood_cleansed, dtype: int64
In [691]:
p2=top_neighbourhood_cleansed.plot(kind='bar')
p2.set_title('Neighbourhoods with the most Stays')
p2.set_ylabel('Count of Stays')
p2.set_xlabel('Neighbourhood Name')
p2.set_xticklabels(p2.get_xticklabels(), rotation=90);
In [673]:
listings_num.columns
Out[673]:
Index(['id', 'host_id', 'host_response_rate', 'host_is_superhost',
       'host_total_listings_count', 'host_has_profile_pic',
       'host_identity_verified', 'neighbourhood_cleansed', 'property_type',
       'room_type', 'accommodates', 'bathrooms', 'bedrooms', 'beds',
       'bed_type', 'amenities', 'price', 'guests_included', 'extra_people',
       'minimum_nights', 'maximum_nights', 'availability_30',
       'availability_60', 'availability_90', 'availability_365',
       'number_of_reviews', 'review_scores_rating', 'review_scores_accuracy',
       'review_scores_cleanliness', 'review_scores_checkin',
       'review_scores_communication', 'review_scores_location',
       'review_scores_value', 'instant_bookable', 'cancellation_policy',
       'require_guest_profile_picture', 'require_guest_phone_verification',
       'calculated_host_listings_count', 'reviews_per_month'],
      dtype='object')
In [674]:
listings_num.head()
Out[674]:
id host_id host_response_rate host_is_superhost host_total_listings_count host_has_profile_pic host_identity_verified neighbourhood_cleansed property_type room_type accommodates bathrooms bedrooms beds bed_type amenities price guests_included extra_people minimum_nights maximum_nights availability_30 availability_60 availability_90 availability_365 number_of_reviews review_scores_rating review_scores_accuracy review_scores_cleanliness review_scores_checkin review_scores_communication review_scores_location review_scores_value instant_bookable cancellation_policy require_guest_profile_picture require_guest_phone_verification calculated_host_listings_count reviews_per_month
0 127 229 96.0 1 3.0 2 2 81 1 0 4 1.0 1.0 1.0 4 1467 85.0 2 5.0 1 365 14 41 71 346 207 95.0 10.0 10.0 10.0 10.0 9.0 10.0 0 1 0 0 2 4.07
1 374 753 98.0 2 6.0 2 2 81 1 0 4 1.0 1.0 1.0 4 2729 150.0 1 0.0 2 90 13 13 16 291 43 96.0 10.0 10.0 10.0 10.0 10.0 10.0 0 2 1 1 6 1.48
2 977 1534 67.0 1 2.0 2 2 81 10 0 11 4.5 5.0 7.0 4 1350 975.0 10 25.0 4 30 1 6 17 220 20 97.0 10.0 10.0 10.0 10.0 10.0 10.0 0 2 0 0 2 1.15
3 2512 1162 NaN 1 1.0 2 2 81 1 0 3 1.0 0.0 2.0 4 864 100.0 1 0.0 1 1125 0 0 0 143 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 1 0.00
4 142 319 100.0 1 2.0 2 2 81 10 0 6 2.0 3.0 3.0 4 2060 450.0 6 15.0 1 1125 30 60 90 365 38 92.0 9.0 9.0 10.0 10.0 9.0 9.0 0 2 0 0 1 0.89

Using the numerical values of the categories to check which. factors are affecting the price

In [676]:
col = ['host_response_rate', 'host_is_superhost',
       'host_total_listings_count', 'host_has_profile_pic',
       'host_identity_verified', 'neighbourhood_cleansed', 'property_type',
       'room_type', 'accommodates', 'bathrooms', 'bedrooms', 'beds',
       'bed_type', 'amenities', 'price', 'guests_included', 'extra_people',
       'minimum_nights', 'maximum_nights', 'availability_30',
       'availability_60', 'availability_90', 'availability_365',
       'number_of_reviews', 'review_scores_rating', 'review_scores_accuracy',
       'review_scores_cleanliness', 'review_scores_checkin',
       'review_scores_communication', 'review_scores_location',
       'review_scores_value', 'instant_bookable', 'cancellation_policy',
       'require_guest_profile_picture', 'require_guest_phone_verification',
       'calculated_host_listings_count', 'reviews_per_month']
sns.set(style="whitegrid", color_codes=True)
sns.pairplot(listings_num.loc[(listings_num.price > 0)][col].dropna())
plt.show();
In [675]:
corr = listings_num.loc[(listings_num.price > 0)][col].dropna().corr()
plt.figure(figsize = (16,16))
sns.set(font_scale=1)
sns.heatmap(corr, cbar = True, annot=True, fmt = '.1f', xticklabels=col, yticklabels=col)
plt.show();
In [650]:
# line graphs linear regression plot price vs accommodates
sns.lmplot(x='accommodates', y="price", data=listings)
Out[650]:
<seaborn.axisgrid.FacetGrid at 0x1a3f5ae1d0>
In [651]:
# line graphs linear regression plot price vs bedrooms
sns.lmplot(x='bedrooms', y="price", data=listings, hue='accommodates')
Out[651]:
<seaborn.axisgrid.FacetGrid at 0x1a399fb9d0>
In [652]:
# line graphs linear regression plot price vs beds
sns.lmplot(x='beds', y="price", data=listings)
Out[652]:
<seaborn.axisgrid.FacetGrid at 0x1a3a6f9c10>
In [656]:
# using subplot with pltmatlib 1 x 2
fig, (ax1, ax2) = plt.subplots(1, 2)
sns.distplot(listings.price, ax=ax1)
sns.boxplot(x='accommodates', y='price', data=listings, ax=ax2)
Out[656]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a6849cd50>
In [657]:
# using subplot with pltmatlib 1 x 2
fig, (ax1, ax2) = plt.subplots(1, 2)
sns.distplot(listings.price, ax=ax1)
sns.boxplot(x='beds', y='price', data=listings, ax=ax2)
Out[657]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a67fc9c50>
In [685]:
# disttibution and histogram
sns.distplot(listings.price)
Out[685]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a6771e250>

6. Conclusion and Observation

We found out that the price mainly affected the following factors using the correlation hotmap

  1. Number of poeple the listing can accomodates
  2. Number of bedrooms
  3. NUmber of beds

We noticed that the price increase is propotinal to the number of people it accomodates and/or number of bedrooms and beds.

In [ ]: